大規模並列プロセッサのプログラミング：実践入門：シーケンシャルな限界を越えて

'無料ランチ'の終わり

数十年にわたり、開発者は「シーケンシャルな限界」と呼ばれる時代を享受してきました。この時代は デンナードスケーリング すべての新しいチップ世代でより高速なクロック速度が得られることを保証していました。しかし、私たちが直面したのは パワーウォールです。性能は周波数の関数ではなくなりました。それは 同時実行性の関数となりました。前進するためには、 計算的思考 を用いて、抽象的な 数値的手法 と現代の 並列実行モデルの間のギャップを埋める必要があります。

精度と性能のトレードオフ

ある ドメイン問題 （例：分子動力学）を マルチコアホスト から CUDAデバイス に移行することは、単なる構文の変更を超えます。それは 問題分解の変化です。並列化を行う際には、演算順序を頻繁に変更します。浮動小数点演算は結合則が成り立たないため、以下のトレードオフに直面します： 浮動小数点の精度と正確さ。並列処理の結果は数学的には正当でも、逐次処理の祖先とは数値的に異なる可能性があります。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary reason the 'Sequential Ceiling' was reached?

The end of Moore's Law entirely.

Thermal limits and the Power Wall hindering frequency scaling.

Lack of developer interest in C++.

The transition to quantum computing.

QUESTION 2

According to Amdahl's Law, if 5% of a program is strictly sequential, what is the maximum theoretical speedup?

Infinite speedup.

Approximately 20x.

5x.

100x.

QUESTION 3

Why might a parallel Molecular Dynamics simulation yield slightly different results than a sequential one?

The CPU uses 64-bit while the GPU only uses 8-bit.

Floating-point addition is non-associative in parallel execution.

Parallel threads randomly skip calculations.

The CUDA compiler ignores numerical methods.

QUESTION 4

What does 'Problem Decomposition' involve in the context of parallel programming?

Breaking code into functions for readability.

Mapping domain-specific data to parallel execution models like threads or grids.

Deleting unnecessary variables to save memory.

Compiling the code for multiple OS targets.

QUESTION 5

Which of the following describes the 'Computational Thinking' bridge?

A hardware component between the CPU and GPU.

A framework to translate domain knowledge into architecture-aware algorithms.

An automated AI tool that writes CUDA kernels.

The process of upgrading RAM on a host machine.